CLAIMS 



1 . A method for modeling an information request, comprising: 

receiving unlabeled and labeled documents; 

extracting a set of features from each document; 

5 learning a model from example documents marked as positive or negative with respect to 

said request wherein said model scores documents to evaluate a degree of membership in a 
group responsive to said request; 

evaluating the performance of model settings on example documents; 

applying an adjustment algorithm that provides a threshold value 6tiew; 

10 applying a scoring function that computes score, a value assigned to a document by the 

learnt model, and classifies the document based on the sign of the following equation: 

Class(X) = Sign{score - 0^) 



2. The method according to claim 1, wherein: 

15 said model is a support vector machine determined by training data from labeled example 

documents. 



3. The method according to claim 1, wherein: 

said model comprises a list of terms and weights extracted from labeled example 
20 documents. 



4. The method according to claim 1, wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

25 extracting terms and features from the positive and negative documents; 

ranking terms and features; 

selecting a subset of terms and features from the ranked terms and features; 
assigning a weight W/ for each term and feature; 
setting a threshold 6 for the model to zero. 



30 



5. The method according to claim 4, wherein: 

said terms and features are ranked in decreasing order of their Rocchio score calculated 
as follows: 



RocchiOi = TF- + a 



35 in which 

TFijext: The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document j 
Df^\ Positive document set 
D^: Negative document set 

-22- 

WAI-2l05639vl 



R: the number of positive documents (i.e.. the size of Dr) 
N: the number of negative documents 

6. The method according to claim 4, wherein: 

said terms and features are assigned a weight as follows: 

w. = Rocchio- ' idf- , calculated as: 



10 



15 



where 



Rocchio. = TF^j^^ + a 



\^ DjeDn8iWi€Di J 
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TFijext' The frequency of feature in the text description of the information need 
TFf The number of occurences of feature In document j 
Dr. Positive document set 
Da/: Negative document set 

R: the number of positive documents (i.e., the size of Dr) 
N: the number of negative documents 

and idfj is calculated as follows: 



idfi = log2(S/ni} + 1 

where S is the count of documents in the set and is the count of the documents in which /' 
20 feature occurs. 



7. A system for filtering documents, comprising: 

a computer coupled to a networl< wherein said computer receives documents over said 
network and transmits documents to an individual user over said network, wherein said computer: 

25 receiving unlabeled and labeled documents; 

extracting a set of features from each document; 

learning a model from the example documents marked as positive or negative with 
respect to a category wherein said model scores documents to evaluate a degree of membership 
in said category; 

30 evaluating the performance of model settings on example documents; 

applying an adjustment algorithm that provides a threshold value ^newi 

applying a scoring function that computes score, a value assigned to a document by the 
learnt model, and classifies the document based on the sign of the following equation: 

Class(X) = Sign{score - 0^) 

35 

8. A system as in claim 7, wherein 

said model is a support vector machine determined by training data from labeled example 
documents. 



40 9. A system as in claim 7, wherein: 
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said model comprises a list of terms and weights extracted from labeled example 
documents. 



lO.The system of claim 7, wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

extracting terms and features from the positive and negative documents; 
ranking terms and features; 

selecting a subset of terms and features from the ranked terms and features; 
assigning a weight W/ for each term and feature; 
setting a threshold 9 for the model to zero. 



1 1 The system of claim 10, wherein: 

said terms and features are ranked in decreasing order of their Rocchio score calculated 
as follows: 



Rocchio^ = TF.j^f + a 



- Ytk. -6 — Ytk. 
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in which 



TFi^Text' The frequency of feature in the text description of the information need 
TFf The number of occurences of feature in document j 
Dr. Positive document set 
D^: Negative document set 

R: the number of positive documents (i.e., the size of Dr) 
N: the number of negative documents 



12.The system of claim 10, wherein: 

said terms and features are assigned a weight as follows: 

w. = Rocchio^ • idf^ . calculated as: 



Rocchio, = TF, ,^, + a - Yj^, ' P 



1 



where 



\^ ^ Dje Dff & Wj € Di J 



TFije^: The frequency of feature in the text description of the information need 
TFif. The number of occurences of feature in document j 
Dr. Positive document set 
D/y: Negative document set 

R: the number of positive documents (i.e., the size of Dr) 
N: the number of negative documents 



and idfi is calculated as follows: 



idfi = log2(N/n) + 1 
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where N is the count of documents in the set and n,- is the count of the documents in which / 
feature occurs. 

13. A method for retrieving information in response to a request, comprising: 

receiving unlabeled and labeled documents; 
extracting a set of features from each document; 

learning a model from example documents marked as positive or negative with respect to 
said request wherein said model scores documents to evaluate a degree of membership in a 
group responsive to said request; 

evaluating the performance of model settings on example documents; 

applying an adjustment algorithm that provides a threshold value ^t^ewi 

applying a scoring function that computes score, a value assigned to a document by the 
learnt model, and classifies the document based on the sign of the following equation: 

Class{X) = Sign{score - 

14. The method according to claim 13, wherein: 

said model is a support vector machine determined by training data from labeled example 
documents. 

15. The method according to claim 13, wherein: 

said model comprises a list of terms and weights extracted from labeled example 
documents. 

le.The method according to claim 13, wherein: 

said model comprises a list of terms and corresponding weights and a threshold value 
determined by: 

extracting terms and features from the positive and negative documents; 
ranking terms and features; 

selecting a subset of terms and features from the ranked terms and features; 
assigning a weight W/ for each term and feature; 
setting a threshold 6 for the model to zero. 

17.The method according to claim 16, wherein: 

said terms and features are ranked in decreasing order of their Rocchio score calculated 



as follows: 




in which 



Text' The frequency of feature in the text description of the information need 
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TFf The number of occurences of feature in document ; 
Dr. Positive document set 
Dn', Negative document set 

R: the number of positive documents (i.e.. the size of Or) 
N: the number of negative documents 

18The method according to claim 16, wherein: 

said tenms and features are assigned a weight as follows: 
w. = RocchiOf ' idf. . calculated as: 



RocchiOi = 77^ 7.^, + a 
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where 



TF/ rexf' The frequency of feature in the text description of the information need 
Tf f The number of occurences of feature in document j 
Dr. Positive document set 
Dn: Negative document set 

R: the number of positive documents (i.e., the size of Dr) 
N: the number of negative documents 



and idfi is calculated as follows: 



idfi = log2(S/ni) + 1 

where S is the count of documents in the set and H/ is the count of the documents in which 
feature occurs. 
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